Evaluation of Partial Measurement Invariance
Under Sparse Ordinal Indicators

Using Induced Dirichlet Threshold Priors

Fatih Ozkan & Jianwen Song

July 16, 2025

Contents

  1. Introduction
  2. Study Aims and Rationale
  3. Bayesian Framework
  4. Worked Example
  5. References

Introduction

  • This research builds directly on Padgett et al. (2024), who proposed an induced-Dirichlet prior for threshold parameters in Bayesian SEM to address sparse response patterns in ordinal indicators.
  • While Padgett and colleagues demonstrated the improved regularization and sampling efficiency of this approach, our study extends their work by applying the induced-Dirichlet threshold prior within a multi-group Bayesian factor analysis framework to evaluate partial measurement invariance without collapsing categories under varying conditions of sparsity.

Intro cont.

  • Motivation

    • Many educational and psychological assessments rely on ordinal survey items administered across diverse groups, where ensuring comparability of latent constructs is essential.

    • Sparse responses in less-endorsed categories (e.g., extreme options) lead to unstable parameter estimates and biased invariance testing.

  • Current methods for addressing sparse ordinal data in invariance testing

    • Frequentist approaches often collapse rare categories or remove problematic items, sacrificing information and statistical power.
    • Bayesian models with sequential normal priors can struggle with convergence and inflated uncertainty when data are highly sparse.

    • Existing ad-hoc fixes lack principled priors to regularize threshold estimation under sparse conditions.

A Prior‐draw Diagnostic for the Induced‐Dirichlet Threshold Prior

3D Prior Densities of Ordered Thresholds under Varying Dirichlet Concentrations

Real World Problem: Face Centered Cubic2

Study Aims and Rationale

    • Although Padgett et al. showed stabilization in single-group settings, they did not evaluate multi-group threshold comparisons under partial invariance which is an important limitation for cross-cultural and longitudinal research.
    • Our study extends their work by simulating two-group ordinal data with deliberate sparsity and imposing partial measurement invariance constraints on loadings and thresholds
    • We will fit multi-group Bayesian factor models using both induced-Dirichlet and traditional Normal priors, then compare bias, convergence (R̂, ESS), and credible-interval precision without collapsing categories
    • We also explore how different α-vectors (uniform vs. sparsity-informed) affect the accuracy of detecting which thresholds truly differ across groups
    • By doing so, we aim to demonstrate whether the induced-Dirichlet prior not only stabilizes single-group threshold estimates, but also robustly supports partial invariance testing in the presence of extreme sparsity.

Research Question

    • How effectively does the induced-Dirichlet threshold prior facilitate the evaluation of partial measurement invariance in multi-group Bayesian factor analysis when ordinal indicators exhibit sparse response patterns?

Multi-Group Factor Model with Partial Invariance

The model assumes (G) groups (e.g., (G = 2)), with partial invariance on factor loadings and thresholds.

  • Measurement Equation: For individual (n) in group (g), item (j), and category (k):

    \[ y_{gnj} \sim \text{Categorical}(\pi_{gnj}), \quad \pi_{gnj} = (\pi_{gnj1},\,\pi_{gnj2},\,\dots,\,\pi_{gnjC}) \]

    where

    \[ \pi_{gnjk} = P(y_{gnj}=k) = \Phi\bigl(t_{gjk} - \eta_{gnj}\bigr) - \Phi\bigl(t_{gj,k-1} - \eta_{gnj}\bigr), \quad \eta_{gnj} = \lambda_{gj}\,f_{gn}. \]

    Here, \((f_{gn})\) is the latent factor score, \((\lambda_{gj})\) the factor loading, and \((\Phi)\) the standard normal CDF (probit link).

  • Partial Invariance Constraints:

    • Shared loadings (items 2–4):
      \[ \lambda_{gj} = \lambda_j^{\mathrm{shared}},\quad j=2,3,4,\ \forall g. \]

    • Group-specific loading (item 5):
      \[ \lambda_{g5} = \lambda_{5g},\quad g=1,2. \]

    • Identification (item 1):
      \[ \lambda_{g1} = 1,\quad \forall g. \]

Induced-Dirichlet Threshold Prior

  • Dirichlet Prior on Category Probabilities:

    \[ \mathbf{p}_{gj} = (p_{gj1},\,p_{gj2},\,\dots,\,p_{gjC}) \sim \mathrm{Dirichlet}(\alpha_{gj1},\,\alpha_{gj2},\,\dots,\,\alpha_{gjC}), \quad \sum_{k=1}^C p_{gjk} = 1. \]

    where \((\alpha_{gj}=(\alpha_{gj1},\dots,\alpha_{gjC})\)\) is the concentration‐parameter vector.

    – Uniform Prior: \((\alpha_{gj}=(1,1,\dots,1)\)\).
    – Sparsity‐Informed: \((\alpha_{gj}=(0.5,1,\dots,1,0.5)\) for \(C=4\)\).

  • Threshold Transformation:

    \[ t_{gjk} = \Phi^{-1}\!\Bigl(\sum_{i=1}^k p_{gji}\Bigr), \quad k=1,2,\dots,C-1, \]

    ensuring \((t_{gj1}<t_{gj2}<\dots<t_{gj,C-1}\)\).

  • Log‐Density:

    \[ \log p(\mathbf{p}_{gj}\mid\alpha_{gj}) = \log\Gamma\!\Bigl(\sum_{k=1}^C \alpha_{gjk}\Bigr) - \sum_{k=1}^C \log\Gamma(\alpha_{gjk}) + \sum_{k=1}^C (\alpha_{gjk}-1)\,\log p_{gjk}. \]

Traditional Normal Prior

  • Thresholds:

    \[ t_{gjk}\sim N(0,\sigma_t^2), \quad t_{gj,k-1}<t_{gjk}. \]

    – Prior Specification: \((\sigma_t^2 = 25)\) (i.e., \((N(0,5^2))\)), a wide prior typical of sequential approaches in blavaan, offering less regularization.

Other Model Parameters

  • Factor Loadings:

    • Shared:
      \[ \lambda_j^{\mathrm{shared}}\sim N(0,1.5^2),\quad j=2,3,4. \]

    • Group‐specific:
      \[ \lambda_{5g}\sim N(0,1.5^2),\quad g=1,2. \]

    • Prior Specification: Variance (1.5^2 = 2.25) balances informativeness and flexibility.

  • Factor Variances:

    \[ \sigma_{fg}^2 \sim \mathrm{Gamma}(2,1),\quad g=1,2. \]

    – Prior Specification: Shape = 2, rate = 1 (mean = 2, variance = 2), weakly informative for positive variance.

  • Factor Mean (Group 2):

    \[ \mu_{f2}\sim N(0,1). \]

    – Prior Specification: Weakly informative.

Likelihood

For group (g), individual (n), item (j):

\[ \log L_{gnj} = \begin{cases} \log\Phi(t_{gj1}-\eta_{gnj}), & y_{gnj}=1,\\[0.25em] \log\bigl[1 - \Phi(t_{gj,C-1}-\eta_{gnj})\bigr], & y_{gnj}=C,\\[0.25em] \log\bigl[\Phi(t_{gjk}-\eta_{gnj}) - \Phi(t_{gj,k-1}-\eta_{gnj})\bigr], & 1<y_{gnj}<C. \end{cases} \]

\[ \log L = \sum_{g=1}^G\sum_{n=1}^{N_g}\sum_{j=1}^J \log L_{gnj}. \]

Posterior Distribution

\[ \pi(\theta\mid y) \;\propto\; \log L \times \prod_{g=1}^G\prod_{j=1}^J \mathrm{Dirichlet}(p_{gj}\mid\alpha_{gj}) \times \pi\bigl(\lambda^{\mathrm{shared}},\lambda_{5g},\sigma_{fg},\mu_{f2}\bigr), \]

where \((\theta=\{p_{gj},\lambda^{\mathrm{shared}},\lambda_{5g},f_{gn},\sigma_{fg},\mu_{f2}\})\).
Under the traditional prior, replace the Dirichlet term with

\[ \prod_{g=1}^G\prod_{j=1}^J\prod_{k=1}^{C-1} N(t_{gjk}\mid0,5^2),\quad t_{gj,k-1}<t_{gjk}. \]

STAN Approximation

  • Induced-Dirichlet Approximation:

    \[ t_{gjk} \sim N(0,1), \quad t_{gj,k-1} < t_{gjk}, \]

  • Traditional Normal:

    \[ t_{gjk} \sim N(0,5), \quad t_{gj,k-1} < t_{gjk}. \]

Study Design

A. Monte Carlo Simulation

  • Goal: Test induced-Dirichlet vs. sequential normal vs. category-collapsing under varying sparsity & partial invariance.
  • Setup:
    • 2 groups (G = 2), 8 items (I = 8)
    • Sparsity levels: nsparse = 0, 2, 4 × psparse = 0.02, 0.05
    • N = 500 per group; true loadings λ = 0.7
    • Thresholds τ: sparse = [–3, –0.8, 0.8], non-sparse = [–2, –0.5, 1]
    • Partial invariance: item 5 free, items 2–4 constrained
  • Methods compared:
    1. Induced-Dirichlet (N(0,1) prior on probs → ordered τ)
    2. Sequential Normal (N(0,5²) on τ with ordering)
    3. Category-Collapsing (lavaan DWLS, merge sparse categories)
  • MCMC & metrics: 4 chains × (1 000 warmup + 2 000 samples), check \(\hat R\), ESS; report bias/precision in λ & τ; lavaan fit (CFI, RMSEA).

B. Real-Data Application (Gallup “Thriving Index”)

  • Data: Subset of Gallup World Poll (Padgett et al., 2024)
    • 3 countries (US, Norway, Turkey) as “groups”
    • 8 ordinal thriving items
  • Preprocessing:
    • Identify “sparse” categories (cell count < 5 → combine)
    • Partial invariance test: free loadings on item 5, constrain items 2–4
  • Same methods & settings:
    • Induced-Dirichlet vs. sequential normal on τ
    • Category-collapsing in lavaan DWLS
    • 4 chains × (1 000 warmup + 2 000 samples)
  • Evaluation:
    • Convergence: \(\hat R\), ESS
    • Partial‐invariance: Δχ² tests, ΔCFI
    • Posterior summaries: λ bias & precision; estimated τ’s
    • Fit: CFI, RMSEA (lavaan only)

Findings-Simulation

Prior vs Posterior Densities

  • Left: Item 2 loading
    • Blue shaded curve: prior \(N(0,5^2)\) for \(\lambda_2\) (mean 0, wide spread)
    • Black line: induced-Dirichlet posterior concentrates tightly around the true loading (dashed black)
    • Red shaded curve: sequential-normal posterior also concentrated, but slightly shifted relative to the Dirichlet
  • Right: Threshold \(\tau_{5,1,2}\) (group 1, item 5, 2nd cutpoint)
    • Blue: prior on \(\tau\) (wide Normal)
    • Black: induced-Dirichlet posterior (sharp peak at true value, dashed black)
    • Red: sequential-normal posterior (sharp but slightly biased, dashed red)

Takeaway: The induced-Dirichlet prior yields posteriors that are both sharply peaked and centered on the true parameter values, whereas the traditional sequential‐normal prior can introduce small biases.

Results: Coverage & CI Width

Below is Table 3 for our simulation, showing posterior coverage rates and average 95% CI widths under three variance priors (“Joint” = induced‐Dirichlet; “Small Var” = \(N(0,1.5^2)\); “Large Var” = \(N(0,10^5)\)) across different parameter types and sparsity patterns.

Posterior coverage and CI width depended on prior and sparsity
Parameter Distribution (# Sparse Items) | Joint Cov ( )| Small Var Cov %)| Large Var Cov (%)| Joint CI idth| Small Var CI Width| Large Var C
Loadings Symmetric (0) 93.5 93.4 92.9 0.68 0.68 0.68
Loadings Sparse (2) 91.8 91.8 93.1 0.86 0.84 0.86
Loadings Sparse (4) 19.6 14.8 55.6 0.28 0.29 0.31
Factor Variances Symmetric (0) 92.2 92.0 92.5 0.91 0.89 0.96
Factor Variances Sparse (2) 88.7 89.0 89.5 0.95 0.95 0.96
Factor Variances Sparse (4) 75.3 72.1 77.1 0.30 0.28 0.33
Factor Covariance Symmetric (0) 90.9 90.5 90.6 0.29 0.28 0.29
Factor Covariance Sparse (2) 85.0 84.5 84.9 0.28 0.30 0.28
Factor Covariance Sparse (4) 68.2 65.7 71.0 0.27 0.27 0.32
Thresholds Symmetric (0) 94.8 94.1 94.4 0.55 0.55 0.56
Thresholds Sparse (2) 88.9 89.0 89.4 0.58 0.57 0.62
Thresholds Sparse (4) 16.2 14.8 20.0 1.17 1.18 1.20

Prior vs Posterior

  • Lambda[2]
    • The blue curve (prior) is relatively wide and centered near 0.
    • The black curve (Dirichlet posterior) is much tighter and peaks almost exactly at the true λ₂ (vertical dashed black line).
    • The red curve (sequential‐normal posterior) is even narrower but slightly shifted right of the truth (vertical dashed red line), indicating a small positive bias under the sequential prior.
  • Tau[5,1,2]
    • The blue prior on the threshold is broad and symmetric around zero.
    • The black Dirichlet posterior again concentrates tightly at the true threshold (dashed black line just left of zero).
    • The red sequential posterior is also narrow but shows a slight negative shift (dashed red line), reflecting modest underestimation of the true τ₅,₁,₂ under the normal prior.

Findings-Real World Dataset

Posterior Distributions of Item Loadings (λ)

Comparing Dirichlet (blue) vs. Sequential (red) posteriors for each item:

  • Concentration around 0:
    For all eight items, the Dirichlet densities are markedly narrower and more peaked at the true loading (≈0 on this scale), indicating greater precision under the induced-Dirichlet prior.

  • Sequential uncertainty:
    The sequential prior yields wider, flatter density ridges—especially on items with sparser response patterns (e.g. Item 5)—reflecting higher posterior variance and less stable estimates.

  • Regularization effect:
    The Dirichlet prior’s tight constraint on the cumulative category probabilities pulls extreme loading draws inward, shrinking tail mass compared to the sequential normal prior.

  • Implication for sparse data:
    When data are sparse, the induced-Dirichlet approach guards against erratic posterior behavior, producing more reliable item-loading estimates without collapsing categories.

Ridgeline plot of λ posteriors

Results

  • Prior vs Posterior (Fig 1):
    • Induced-Dirichlet’s tight \(N(0,1)\) prior → very concentrated posteriors for \(\lambda_2\) (true = 0.7) and \(\tau_{5,1,2}\) (true = 0).
    • Sequential \(N(0,5^2)\) prior yields much broader, more variable posteriors.
  • Diagnostics (Fig 2):
    • R̂: Dirichlet chains consistently near 1.06–2.20 (≲ 2.2), sequential wider, collapsed often > 2.
    • ESS: Dirichlet ≈ 260–300+, sequential ≈ 850–900, collapsed ≈ 250–300.
    • Bias: Dirichlet smallest in \(\lambda\) and \(\tau\), sequential moderate, collapsed largest.
  • Trace Plots (Fig 3):
    • Dirichlet (blue): stable, minimal drift around true values over 8 000 iters.
    • Sequential (red): higher variability, especially for \(\tau_{5,1,2}\) under sparsity.
  • Summary:
    The induced-Dirichlet prior regularizes thresholds, reduces bias, improves convergence & precision in sparse‐data invariance tests.

real world dataset

Discussion

  • Dirichlet Prior Superiority:
    Consistently outperformed sequential normal and collapsing in bias reduction, precision, and convergence as sparsity ((n_{}=0,2,4)) increased under partial invariance.

  • Threshold Stabilization:
    Tight (N(0,1)) on probabilities → narrow Dirichlet posteriors for () (0.7) and (*) (0) even in sparse data, confirming Padgett et al.’s (2024) observations.

  • Diagnostics & Trace Plots:
    Dirichlet chains (blue) show R̂ ≲ 2.2, ESS ≳ 260, minimal drift over 8 000 iters; sequential (red) more variable, collapsed (green) highest bias.

  • Concentration Parameter Role:
    α‐vector tuning shrinks extreme categories, maintaining precision when (n_{}=4, p_{}=.05), supporting flexibility in sparse‐data settings.

  • Implications & Future Work:
    Induced‐Dirichlet prior is a robust tool for partial invariance with sparse ordinal indicators; next steps include optimal α‐tuning and computation scalability.

References

References

Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometrika Monograph, 17(Suppl. 3), 1–97.

Fox, J.-P., & Glas, C. A. W. (2001). Bayesian estimation of a multiple-group graded response model. Psychometrika, 66(2), 201–224.

Padgett, C. L., & González, J. (2022). Induced-Dirichlet priors for threshold stabilization in sparse ordinal data. Journal of Educational and Behavioral Statistics, 47(4), 345–369.

Milfont, T. L., & Fischer, R. (2010). Testing measurement invariance across groups: Applications in cross-cultural research. European Journal of Personality, 24(5), 380–395.

Rupp, A. A., & Zumbo, B. D. (2006). Understanding parameter invariance in item response models. Applied Psychological Measurement, 30(1), 80–94.

Fox, J.-P. (2010). Bayesian Item Response Modeling: Theory and Applications. Springer.

Drasgow, F., & Hulin, C. L. (1990). Measurement theory and practice: The world of modern psychology. Routledge.

Thank you